Scalable Transaction Management with Serializable Snapshot Isolation on HBase
نویسندگان
چکیده
Key-value based data storage systems such as HBase and Bigtable provide high scalability and availability compared to traditional relational databases. However, unlike relational databases, the existing key-value stores provide only limited transactional functionality, such as single-row transactions. In this paper, we address the problem of building scalable transaction management mechanisms for multirow ACID transactions on data storage systems such as HBase. To support scalability and availability, the transaction management functions are decoupled from the data storage system. Furthermore, our design does not depend on any central transaction management layer, instead these functions and the related recovery actions are performed in a decentralized and cooperative manner by the application level processes executing the transactions. Our protocol uses snapshot-isolation model as it provides more concurrency. Since the basic snapshot isolation model does not guarantee serializability, our protocol uses a technique based on identifying dependency cycles amongst transactions to avoid serialization anomalies. The protocol for supporting serializability is also performed in a decentralized manner. We demonstrate the scalability and robustness of our approach as well as the correctness of the protocol in ensuring serializable executions of transactions on HBase. We also present and evaluate an alternative approach based on a hybrid model where certain functions, such as conflict detection, are performed by a dedicated service.
منابع مشابه
HBaseSI: Multi-row Distributed Transactions with Global Strong Snapshot Isolation on Clouds
This paper presents the “HBaseSI” client library, which provides global strong snapshot isolation (SI) for multi-row distributed transactions in HBase. This is the first strong SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalize...
متن کاملTowards serializable replication with snapshot isolation
Replicated database systems necessarily deal with multiple versions of data items active concurrently across nodes in a replication group. As a consequence, there is a natural fit between replication and snapshot isolation (SI), which uses multiple versions of data within a single site to provide nonblocking read operations. However, snapshot isolation does not guarantee serializable execution ...
متن کاملSerializable Snapshot Isolation in Shared-Nothing, Distributed Database Management Systems
NoSQL data storage systems provide high scalability and availability in exchange for limited transactional guarantees. In many cases, however, an application cannot give up transactional support but still needs the scalability provided by such systems. One approach for overcoming this limitation is to implement Snapshot Isolation (SI) on top of these systems. SI prevents most non-serializable e...
متن کاملSerializable Snapshot Isolation for Replicated Databases in High-Update Scenarios
Many proposals for managing replicated data use sites running the Snapshot Isolation (SI) concurrency control mechanism, and provide 1-copy SI or something similar, as the global isolation level. This allows good scalability, since only ww-conflicts need to be managed globally. However, 1-copy SI can lead to data corruption and violation of integrity constraints [5]. 1-copy serializability is t...
متن کاملModelling Snapshot Isolation Performance
Snapshot Isolation (SI) level is extensively used in commercial database systems. We developed a simple SI implementation protocol for distributed DBMS and implemented it in the Apache HBase. The work presents the performance evaluation of the protocol. We have measured the performance of a single-node system and modeled the performance of a distributed HBase cluster.
متن کامل